Introduction

Materials and methods

Materials


We are working with a dataset about Breast Cancer that we have obtained from kaggle website

This is the dataset we are working with:

##         patient_id          education     id_healthcenter id_treatment_region
##  111035895969:  1   Diploma      :253   1110000154: 14    1110000329:284     
##  111035896483:  1   Elementary   :113   1110000280: 11    1110000330:261     
##  111035897677:  1   Middle School: 97   1110000303: 11    1110000331:189     
##  111035897739:  1   Bachelor     : 82   1110000181: 10                       
##  111035897959:  1   Illiterate   : 79   1110000305: 10                       
##  111035898167:  1   High School  : 55   1110000224:  9                       
##  (Other)     :728   (Other)      : 55   (Other)   :669                       
##  hereditary_history   birth_date        age            weight      
##  0:310              Min.   :1944   Min.   :20.00   Min.   : 35.00  
##  1:424              1st Qu.:1978   1st Qu.:29.00   1st Qu.: 73.00  
##                     Median :1985   Median :34.00   Median : 79.00  
##                     Mean   :1982   Mean   :36.81   Mean   : 78.75  
##                     3rd Qu.:1990   3rd Qu.:41.00   3rd Qu.: 87.00  
##                     Max.   :1999   Max.   :75.00   Max.   :101.00  
##                     NA's   :2                                      
##  thickness_tumor  marital_status        marital_length pregnency_experience
##  Min.   :0.0100   0:139          above 10 years:409    0:144               
##  1st Qu.:0.4000   1:595          under 10 years:325    1:590               
##  Median :0.6000                                                            
##  Mean   :0.5757                                                            
##  3rd Qu.:0.8000                                                            
##  Max.   :1.3000                                                            
##                                                                            
##   giving_birth age_FirstGivingBirth abortion     blood     taking_heartMedicine
##  1      :364   above 30:428         0:594    A+     :176   0:281               
##  0      :137   under 30:306         1:140    A-     :124   1:453               
##  2      :128                                 AB+    :119                       
##  3      : 75                                 B+     :108                       
##  4      : 13                                 O+     : 78                       
##  5      : 12                                 (Other):128                       
##  (Other):  5                                 NA's   :  1                       
##  taking_blood_pressure_medicine taking_gallbladder_disease_medicine smoking
##  0:210                          0:343                               0:503  
##  1:524                          1:391                               1:231  
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  alcohol breast_pain radiation_history Birth_control  menstrual_age
##  0:454   0:280       0:366             0:261         above 12:304  
##  1:280   1:454       1:368             1:473         not yet :  3  
##                                                      under 12:427  
##                                                                    
##                                                                    
##                                                                    
##                                                                    
##   menopausal_age Benign_malignant_cancer           condition   treatment_age  
##  above 50: 36    Benign   :303           death          :351   Min.   :20.00  
##  not yet :644    Malignant:431           recovered      :132   1st Qu.:29.00  
##  under 50: 52                            under treatment:251   Median :34.00  
##  NA's    :  2                                                  Mean   :36.83  
##                                                                3rd Qu.:41.00  
##                                                                Max.   :75.00  
##                                                                NA's   :2

Cleaning the data


Cleaning the data


BEFORE AFTER
THE COLUMNS ARE DIFFERENT TYPES EACH COLUMN HAS A CORRECT TYPE
0, 1, 2 VALUES BOLEAN VARIABLES
NAMES WITH /R/N CLEAN NAMES
BIRTH DATE WITH 3 CHARACTERS BIRTH DATE WITH 4 CHARACTERS
BLOOD TYPE 44 CORRECT BLOOD TYPES ONLY
WEIRD WEIGHT/AGE CORRELATIONS ELIMINATING PEOPLE UNDER 20 YEARS OLD AND 35 KG
WOMEN AND MEN ONLY WOMEN

Augmenting the data


  • We have added more informative columns
  • We have changed the type of the columns

Statistical analysis


We have created some plots in order to fully understand the data and we have done some statistical analysis like MCA analysis. The plots are shown in the following point: “Results”

Results

Barchar of categorical variables

//: # Variables that affect health (medicines, vicious habits) have a great incidence in breast cancer patients //: # Early menstrual periods before age 12 and starting menopause after age 55 expose women to hormones longer, raising their risk of getting breast cancer

Plots

//: # In most cases, when having taking medicine the death is higher (no sense). //: # Not drinking alcohol or smoking improves recovery. //: # When taking alcohol and smoking the death is lower (it doesn’t make any sense) //: # These are absolute values, maybe we should calculate some relative values

Discussion

Discussion

Conclusion

We have reached the following conclusions

Bibliography

Bibliography

  • Breast cancer incidence (invasive) statistics (10 March 2020). Cancer Research UK
  • Height Percentile Calculator for Men and Women in the United States. DQYDJ
  • Weight Gain After Breast Cancer Diagnosis May Be a Bigger Issue Than Thought in Australia (13 March 2020) BreastCancer.org.

THANKS
FOR YOUR ATTENTION